KAIRUI WU

Introduction of the dataset

Prosper is an American peer-to-peer lending platform. The site allows borrowers to post a listing for a chosen loan amount and purpose. Investors are then given the opportunity to invest in loans of their choice. Prosper collects data on borrower details and provides risk ratings for investors.

The dataset is from the P2P loan platform prosper.com. There are 113937 observations which contain the information of loans in 81 variables.

Univariate Plots Section

## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 8 levels "A","AA","B","C",..: 4 NA 7 NA NA NA NA NA NA NA ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2802 levels "2005-11-25 00:00:00",..: 1137 NA 1262 NA NA NA NA NA NA NA ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 7 levels "A","AA","B","C",..: NA 1 NA 1 5 3 6 4 2 2 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 51 levels "AK","AL","AR",..: 6 6 11 11 24 33 17 5 15 15 ...
##  $ Occupation                         : Factor w/ 67 levels "Accountant/CPA",..: 36 42 36 51 20 42 49 28 23 23 ...
##  $ EmploymentStatus                   : Factor w/ 8 levels "Employed","Full-time",..: 8 1 3 1 1 1 1 1 1 1 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 706 levels "00343376901312423168731",..: NA NA 334 NA NA NA NA NA NA NA ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11585 levels "1947-08-24 00:00:00",..: 8638 6616 8926 2246 9497 496 8264 7684 5542 5542 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...
##                    ListingKey     ListingNumber    
##  17A93590655669644DB4C06:     6   Min.   :      4  
##  349D3587495831350F0F648:     4   1st Qu.: 400919  
##  47C1359638497431975670B:     4   Median : 600554  
##  8474358854651984137201C:     4   Mean   : 627886  
##  DE8535960513435199406CE:     4   3rd Qu.: 892634  
##  04C13599434217079754AEE:     3   Max.   :1255725  
##  (Other)                :113912                    
##                     ListingCreationDate  CreditGrade         Term      
##  2013-10-02 17:20:16.550000000:     6   C      : 5649   Min.   :12.00  
##  2013-08-28 20:31:41.107000000:     4   D      : 5153   1st Qu.:36.00  
##  2013-09-08 09:27:44.853000000:     4   B      : 4389   Median :36.00  
##  2013-12-06 05:43:13.830000000:     4   AA     : 3509   Mean   :40.83  
##  2013-12-06 11:44:58.283000000:     4   HR     : 3508   3rd Qu.:36.00  
##  2013-08-21 07:25:22.360000000:     3   (Other): 6745   Max.   :60.00  
##  (Other)                      :113912   NA's   :84984                  
##                  LoanStatus                  ClosedDate   
##  Current              :56576   2014-03-04 00:00:00:  105  
##  Completed            :38074   2014-02-19 00:00:00:  100  
##  Chargedoff           :11992   2014-02-11 00:00:00:   92  
##  Defaulted            : 5018   2012-10-30 00:00:00:   81  
##  Past Due (1-15 days) :  806   2013-02-26 00:00:00:   78  
##  Past Due (31-60 days):  363   (Other)            :54633  
##  (Other)              : 1108   NA's               :58848  
##   BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:0.15629   1st Qu.:0.1340   1st Qu.: 0.1242  
##  Median :0.20976   Median :0.1840   Median : 0.1730  
##  Mean   :0.21883   Mean   :0.1928   Mean   : 0.1827  
##  3rd Qu.:0.28381   3rd Qu.:0.2500   3rd Qu.: 0.2400  
##  Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##  NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn 
##  Min.   :-0.183          Min.   :0.005   Min.   :-0.183  
##  1st Qu.: 0.116          1st Qu.:0.042   1st Qu.: 0.074  
##  Median : 0.162          Median :0.072   Median : 0.092  
##  Mean   : 0.169          Mean   :0.080   Mean   : 0.096  
##  3rd Qu.: 0.224          3rd Qu.:0.112   3rd Qu.: 0.117  
##  Max.   : 0.320          Max.   :0.366   Max.   : 0.284  
##  NA's   :29084           NA's   :29084   NA's   :29084   
##  ProsperRating..numeric. ProsperRating..Alpha.  ProsperScore  
##  Min.   :1.000           C      :18345         Min.   : 1.00  
##  1st Qu.:3.000           B      :15581         1st Qu.: 4.00  
##  Median :4.000           A      :14551         Median : 6.00  
##  Mean   :4.072           D      :14274         Mean   : 5.95  
##  3rd Qu.:5.000           E      : 9795         3rd Qu.: 8.00  
##  Max.   :7.000           (Other):12307         Max.   :11.00  
##  NA's   :29084           NA's   :29084         NA's   :29084  
##  ListingCategory..numeric. BorrowerState                 Occupation   
##  Min.   : 0.000            CA     :14717   Other              :28617  
##  1st Qu.: 1.000            TX     : 6842   Professional       :13628  
##  Median : 1.000            NY     : 6729   Computer Programmer: 4478  
##  Mean   : 2.774            FL     : 6720   Executive          : 4311  
##  3rd Qu.: 3.000            IL     : 5921   Teacher            : 3759  
##  Max.   :20.000            (Other):67493   (Other)            :55556  
##                            NA's   : 5515   NA's               : 3588  
##       EmploymentStatus EmploymentStatusDuration IsBorrowerHomeowner
##  Employed     :67322   Min.   :  0.00           False:56459        
##  Full-time    :26355   1st Qu.: 26.00           True :57478        
##  Self-employed: 6134   Median : 67.00                              
##  Not available: 5347   Mean   : 96.07                              
##  Other        : 3806   3rd Qu.:137.00                              
##  (Other)      : 2718   Max.   :755.00                              
##  NA's         : 2255   NA's   :7625                                
##  CurrentlyInGroup                    GroupKey     
##  False:101218     783C3371218786870A73D20:  1140  
##  True : 12719     3D4D3366260257624AB272D:   916  
##                   6A3B336601725506917317E:   698  
##                   FEF83377364176536637E50:   611  
##                   C9643379247860156A00EC0:   342  
##                   (Other)                :  9634  
##                   NA's                   :100596  
##             DateCreditPulled  CreditScoreRangeLower CreditScoreRangeUpper
##  2013-12-23 09:38:12:     6   Min.   :  0.0         Min.   : 19.0        
##  2013-11-21 09:09:41:     4   1st Qu.:660.0         1st Qu.:679.0        
##  2013-12-06 05:43:16:     4   Median :680.0         Median :699.0        
##  2014-01-14 20:17:49:     4   Mean   :685.6         Mean   :704.6        
##  2014-02-09 12:14:41:     4   3rd Qu.:720.0         3rd Qu.:739.0        
##  2013-09-27 22:04:54:     3   Max.   :880.0         Max.   :899.0        
##  (Other)            :113912   NA's   :591           NA's   :591          
##         FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
##  1993-12-01 00:00:00:   185     Min.   : 0.00      Min.   : 0.00  
##  1994-11-01 00:00:00:   178     1st Qu.: 7.00      1st Qu.: 6.00  
##  1995-11-01 00:00:00:   168     Median :10.00      Median : 9.00  
##  1990-04-01 00:00:00:   161     Mean   :10.32      Mean   : 9.26  
##  1995-03-01 00:00:00:   159     3rd Qu.:13.00      3rd Qu.:12.00  
##  (Other)            :112389     Max.   :59.00      Max.   :54.00  
##  NA's               :   697     NA's   :7604       NA's   :7604   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  2.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.75             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##  NA's   :697                                     
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries   
##  Min.   :    0.0             Min.   :  0.000      Min.   :  0.000  
##  1st Qu.:  114.0             1st Qu.:  0.000      1st Qu.:  2.000  
##  Median :  271.0             Median :  1.000      Median :  4.000  
##  Mean   :  398.3             Mean   :  1.435      Mean   :  5.584  
##  3rd Qu.:  525.0             3rd Qu.:  2.000      3rd Qu.:  7.000  
##  Max.   :14985.0             Max.   :105.000      Max.   :379.000  
##                              NA's   :697          NA's   :1159     
##  CurrentDelinquencies AmountDelinquent   DelinquenciesLast7Years
##  Min.   : 0.0000      Min.   :     0.0   Min.   : 0.000         
##  1st Qu.: 0.0000      1st Qu.:     0.0   1st Qu.: 0.000         
##  Median : 0.0000      Median :     0.0   Median : 0.000         
##  Mean   : 0.5921      Mean   :   984.5   Mean   : 4.155         
##  3rd Qu.: 0.0000      3rd Qu.:     0.0   3rd Qu.: 3.000         
##  Max.   :83.0000      Max.   :463881.0   Max.   :99.000         
##  NA's   :697          NA's   :7622       NA's   :990            
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   : 0.0000          Min.   : 0.000            Min.   :      0       
##  1st Qu.: 0.0000          1st Qu.: 0.000            1st Qu.:   3121       
##  Median : 0.0000          Median : 0.000            Median :   8549       
##  Mean   : 0.3126          Mean   : 0.015            Mean   :  17599       
##  3rd Qu.: 0.0000          3rd Qu.: 0.000            3rd Qu.:  19521       
##  Max.   :38.0000          Max.   :20.000            Max.   :1435667       
##  NA's   :697              NA's   :7604              NA's   :7604          
##  BankcardUtilization AvailableBankcardCredit  TotalTrades    
##  Min.   :0.000       Min.   :     0          Min.   :  0.00  
##  1st Qu.:0.310       1st Qu.:   880          1st Qu.: 15.00  
##  Median :0.600       Median :  4100          Median : 22.00  
##  Mean   :0.561       Mean   : 11210          Mean   : 23.23  
##  3rd Qu.:0.840       3rd Qu.: 13180          3rd Qu.: 30.00  
##  Max.   :5.950       Max.   :646285          Max.   :126.00  
##  NA's   :7604        NA's   :7544            NA's   :7544    
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.000                      Min.   : 0.000         
##  1st Qu.:0.820                      1st Qu.: 0.000         
##  Median :0.940                      Median : 0.000         
##  Mean   :0.886                      Mean   : 0.802         
##  3rd Qu.:1.000                      3rd Qu.: 1.000         
##  Max.   :1.000                      Max.   :20.000         
##  NA's   :7544                       NA's   :7544           
##  DebtToIncomeRatio         IncomeRange    IncomeVerifiable
##  Min.   : 0.000    $25,000-49,999:32192   False:  8669    
##  1st Qu.: 0.140    $50,000-74,999:31050   True :105268    
##  Median : 0.220    $100,000+     :17337                   
##  Mean   : 0.276    $75,000-99,999:16916                   
##  3rd Qu.: 0.320    Not displayed : 7741                   
##  Max.   :10.010    $1-24,999     : 7274                   
##  NA's   :8554      (Other)       : 1427                   
##  StatedMonthlyIncome                    LoanKey       TotalProsperLoans
##  Min.   :      0     CB1B37030986463208432A1:     6   Min.   :0.00     
##  1st Qu.:   3200     2DEE3698211017519D7333F:     4   1st Qu.:1.00     
##  Median :   4667     9F4B37043517554537C364C:     4   Median :1.00     
##  Mean   :   5608     D895370150591392337ED6D:     4   Mean   :1.42     
##  3rd Qu.:   6825     E6FB37073953690388BC56D:     4   3rd Qu.:2.00     
##  Max.   :1750003     0D8F37036734373301ED419:     3   Max.   :8.00     
##                      (Other)                :113912   NA's   :91852    
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:  9.00             1st Qu.:  9.00       
##  Median : 16.00             Median : 15.00       
##  Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :141.00             Max.   :141.00       
##  NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount          LoanOriginationDate LoanOriginationQuarter
##  Min.   : 1000      2014-01-22 00:00:00:   491   Q4 2013:14450         
##  1st Qu.: 4000      2013-11-13 00:00:00:   490   Q1 2014:12172         
##  Median : 6500      2014-02-19 00:00:00:   439   Q3 2013: 9180         
##  Mean   : 8337      2013-10-16 00:00:00:   434   Q2 2013: 7099         
##  3rd Qu.:12000      2014-01-28 00:00:00:   339   Q3 2012: 5632         
##  Max.   :35000      2013-09-24 00:00:00:   316   Q2 2012: 5061         
##                     (Other)            :111428   (Other):60343         
##                    MemberKey      MonthlyLoanPayment LP_CustomerPayments
##  63CA34120866140639431C9:     9   Min.   :   0.0     Min.   :   -2.35   
##  16083364744933457E57FB9:     8   1st Qu.: 131.6     1st Qu.: 1005.76   
##  3A2F3380477699707C81385:     8   Median : 217.7     Median : 2583.83   
##  4D9C3403302047712AD0CDD:     8   Mean   : 272.5     Mean   : 4183.08   
##  739C338135235294782AE75:     8   3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  7E1733653050264822FAA3D:     8   Max.   :2251.5     Max.   :40702.39   
##  (Other)                :113888                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
## 

Q1.How is the business development of Prosper?

Based on the plot, the number of listed loans decreased sharply from 2008 to 2009. This may due to the depression of global economy, for which people don’t have extra money to do investments and even not believe in the market. After 2009, with the increasing of the popularity of P2P and the economy was becoming stable, people were believing in the market and willing to lend their money to others, for which the number of listed loans kept increasing year by year. The business development of Prosper is on good trend and very stable these years.

Q2.How is the lender yield on Prosper?

Based on the plot, loans had a lender yield from 0.125 to 0.175 were most popular. With the increasing of the lender yiled after 0.175, the number of listed loans decreases.

Q3.What is the distribution of loan purposes on prosper?

The loan with a purpose of debt consolidation is significantly more than others. This means Prosper is not the first choice for most people to get a loan. Getting loans from Prosper is for paying for other loans for most people.

Q4.What is the distribution of loan amounts on Prosper?

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

Based on the plot, the loan with an amount from 1000 to 5000 is significantly more than others. 35000 may be the allowable upper limit of the loan on Prosper.

Q5. What is the count on loan results of Prosper?

For this question, we created the newStatus variable, of which the purpose was to divide loans into two groups: good, bad. We regraded the status of “Completed” and “Current” as “good loan” but other status as “bad loan”.

##  [1] "Completed"              "Current"               
##  [3] "Past Due (1-15 days)"   "Defaulted"             
##  [5] "Chargedoff"             "Past Due (16-30 days)" 
##  [7] "Cancelled"              "Past Due (61-90 days)" 
##  [9] "Past Due (31-60 days)"  "Past Due (91-120 days)"
## [11] "FinalPaymentInProgress" "Past Due (>120 days)"
## [1] "Good" "Bad"

The number of good loans is three times more than the number of bad loans. Thus, the management and operation work of Prosper is excellent and the business is of low risks. Prosper is under good development.

Q5. What is the distribution of borrowers’ employment status on Prosper?

The employed borrowers were significantly more than unemployed borrowers. The full-time employed borrowers were more than part-time borrowers. This indicates that the criterion of getting a loan from Prosper is strict and most unemployed people cannot get a loan from Prosper.

Q6. What is the distribution of employment durations of borrowers on Prosper?

The longer the employment duration is, the less the number of loans is. This may because people who have an entry-level job do not earn much money so that they have to borrow the money to do other things or people who start their own business in early stage of their career need some money to start.

Q7. What is the distribution on borrowers’ income ranges?

Based on the plot, borrowers who had an income range frome 25000 to 75000 were more than others. Besides, an annual income above 25000 dollors might be a good support to get loans from Prosper. Not being employed and having an annual income less than 25000 dollors would improve the difficulty to get loans.

Q8. What is the distribution of borrowers’ debt to income ratio on Prosper?

Based on the plot, the debt to income ratio from 0.175 to 0.225 counts most. With this indicator increasing, the count decreases.

Bivariate Plots Section

Q9. Is there a change of the lender yield with years on Prosper?

In 2011, the lender yield was at the peak. After that year, it kept decreasing. The reason may be that P2P loan became increasingly popular these years, for which more and more lenders wanted to borrow their money to others through P2P platforms. Since the market became more competitive, lenders decreased their yields to ensure that their money could be successfully borrowed. The economy depression in 2008 also influenced the lender yield.

Q10. Is there a change of the loan amount with years on Prosper?

With the increasing popularity of P2P loans and the increasingly strict policy and supervision on P2P loans, people were increasingly believing in this kind of loans, for which more investors lended money on Prosper. This is the reason why the loan amount kept increasing these years.

Q11. Is there a change of the borrower’s credit score with years on Prosper?

Based on the plot, from 2009, there seemed to be a new criterion for borrowers’ credit scores. With the increasing popularity of P2P loans, more and more people who did not have a high credit score started to try to apply loans from prosper, for which the average credit score of borrowers after 2009 slightly decreased.

Q12. Is there a change of the number of loans of different results with years?

The dataset doesn’t contain the whole data of the year 2005 and 2014, so I removed these two when doing the research on the count on the loan status. Based on the plot, from 2009 to 2013, the count of ‘good loan’ increased sharply whereas the count of ‘bad loan’ kept almost the same. This means the market environment is pretty good these years.

Q13. What is the differences of the percentage of ‘good loans’ with various employment status?

From the plot, the percentage of good loans for employed borrowers is higher than unemployed. But, it is surprising that the percentage of good loans for full-time borrower is lower than that for part-time borrower. Also, self-employed borrowers performed better than full-time and part-time borrowers. Besides, the retired borrowers performed worst.

Q14. What is the differences of the percentage of ‘good loans’ with various income range

Based on the plot, it is apparently that with the increasing on income ranges, the percentage of good loans in total loans increases. The higher the borrower’s income is, the lower the risk of the loan.

Q15. What is the difference on the employment duration for loans with different results

The employment status duration of borrowers for good loans is slightly higher than which is for bad loans.

Q16. What is the difference on the debt to income ratio for loans with different results

Based on that plot, the debt to income ratio does not have influences on the loan status. The mean value and quantile values are almost the same between two loan status. This may be because the process of reviewing and the approval on the loan focuses much on the debt to income ratio of the applicant of the loan, for which the strict process prevent from admitting loans to people who have a high debt to income ratio.

Q17. What is the relationship between loan amount and debt to income ratio?

Based on the plot, large amount loans are only given to borrowers who have a debt to income ratio lower than 0.5. The higher the debt to income ratio is, the more difficult the person can get such a loan. The loan amount also has an apparent level, such as 5000, 10000, 15000, …, these amounts are popular in loans on prosper.

Q18. What is the difference on BorrowersAPR with different Employment Status?

Based on the plot, full-time employed borrowers and part-time employed borrowers have the lowest borrower APR. On contrary, unemployed borrowers have the highest borrower APR. The reason may be that some loans are limited to employed borrowers but for unemployed borrowers, there are limited number of loans which have high rates that they can apply.

Q19. What is the difference on Lender Yield with different Prosper Ratings?

Based on the plot, with the increasing of borrowers’ prosper rating, the lender yield decreases.

Multivariate Plots Section

Q20. What is the relationships among BorrowerAPR, LoanAmount and Prosper Rating?

Based on the plot, with the increasing of loan amounts, the borrower APR decreases. Besides, the higher the prosper rating is, the lower the APR is.

Q21. What is the relationship between debt to income ratio with prosper score in different groups of loan?

Based on the plot, when the prosper score is low, the debt to income ratio is a significant indicator to determine the risk of a loan. However, when the prosper score is high, the debt to income ratio is not so important on the risk of a loan.

Q22. What is the relationship between borrowers APR and debt to income ratio in different groups of employment status?

Based on the plot, the borrower APR has a positive linear relation with the debt to income ratio for employed borrowers, especially for full-time employed borrowers. For unemployed borrowers, the relation is not apparent.

Q23. What is the relationship between borrowers APR and employment duration in different groups of loan?

Based on the plot, when the employment duration is short, the borrower APR is significant on the loan risk. People have a short employment duration are not stable and have less revenues. Therefore, when facing a high APR, these type of borrowers probably lose the ability to pay the loan due to unemployments and low revenues. However, for people who have a long employment duration, their salaries are always higher and they are more stable, for which APR becomes less important to the risk of the loan.


Final Plots and Summary

Plot One

Description One

Based on the plot, full-time employed borrowers and part-time employed borrowers have the lowest borrower APR. On contrary, unemployed borrowers have the highest borrower APR. The reason may be that some loans are limited to employed borrowers but for unemployed borrowers, there are limited number of loans which have high rates that they can apply.

Plot Two

Description Two

From the plot, the percentage of good loans for employed borrowers is higher than unemployed. But, it is surprising that the percentage of good loans for full-time borrower is lower than that for part-time borrower. Also, self-employed borrowers performed better than full-time and part-time borrowers. Besides, the retired borrowers performed worst.

Plot Three

Description Three

Based on the plot, when the employment duration is short, the borrower APR is significant on the loan risk. People have a short employment duration are not stable and have less revenues. Therefore, when facing a high APR, these type of borrowers probably lose the ability to pay the loan due to unemployments and low revenues. However, for people who have a long employment duration, their salaries are always higher and they are more stable, for which APR becomes less important to the risk of the loan.


Reflection

In the analysis, the first difficulty I met was the lack of integraty of data with time series. I solved it by limiting the time period. The second one was the complexity of variables. Some categorical variables have 8 - 10 labels. To solve this point, I created several new variables to re-categorize the values to make the analysis easy. The third one was I was lack of the knowledge of finance, for which it was not easy to understand some variables in this dataset. I just tried my best to get a good understanding on variables and ignored the variables which were not easy to understand.

For future work, there are some point that can be further explored. Firstly, the loan status was transfferd to a simple representation in my research. Instead of just two labels (good, bad) in this variable, we can use the original loan status variable which is more complex to do in-depth research. Besides, more features can be considerred in the research in future, for which we may find more patterns and more relationships between variables. In addition, predictive data analytics method can be applied on the research.